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Abstract 

We present a new perfect simulation algorithm for stationary chains having un- 
bounded variable length memory. This is the class of infinite memory chains for which 
the family of transition probabilities is represented by a probabilistic context tree. We 
do not assume any continuity condition: our condition is expressed in terms of the 
structure of the context tree. More precisely, the length of the contexts is a deter- 
ministic function of the distance to the last occurrence of some determined string 
of symbols. It turns out that the resulting class of chains can be seen as a natural 
extension of the class of chains having a renewal string. In particular, our chains 
exhibit a visible regeneration scheme. 
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1. Introduction 

We introduce a new class of discrete time stochastic chains X = (X n ) n ^, taking 
values in a countable alphabet A. These chains have unbounded variable length 
memory This means that the state of the chain at time depends on an unbounded 
suffix of the past . . . X—2X—1, whose length depends on the values assumed by the 
chain in the past. In the present case, the length of this suffix depends on the distance 
to the last occurrence of a given finite reference string a_j . . . a_i of symbols of A. 
More precisely, there exists a function / : N — > N such that if the last occurrence 
of a-i . . . a_i is at distance k in the past, that is if X-k-i ■ ■ ■ X-k-i = &-i ■ ■ ■ d—i, 
and for j = i, . . . , k + i — 1 we have X-j . . . X_j + i_i 7^ a_j . . . a_i, then we need to 
know X_f(k)-i-k ■ ■ ■ X-k-i-iX-k-i ■ ■ ■ X-i in order to decide the state of the chain 
at time 0: 

Suffix of the past we need to know to decide Xq 

, /S V 

• • • X_f(k)-i-k ■ ■ ■ X-k-i-i X-k-i ■ ■ ■ X-k-i X-k ■ ■ ■ X-i . 

„ ' S v ^ 

length— /(fc) last occurrence of a~l 
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In other words, the family of transition probabilities P for these chains is such that 

P(-|...6_ 2 6_ 1 ) = P(-|...c_ 2 c_ 1 ) 
whenever the last occurrence of . . . a_i is at distance k in . . . &-2&-1 and 



6_ 



/(fc)-i-* ■ 



. 6-2^-1 = C_/( fe )_i_ fc . . . C_ 2 C_i. 



Observe that on A = {1, 2}, if the reference string is the symbol 2, and the function 
/ is identically 0, we obtain the renewal chain with symbol 2 as renewal symbol. For 
this reason, we say that this class of stochastic chains generalizes the class of chains 
having a renewal string. 

We highlight three main parameters for the study of this class of chains: the size of 
the reference string, the set of transition probabilities to the symbols of this reference 
string, and the deterministic function /. 

We ask the following questions: (i) What shall we assume on these parameters 
in order to guarantee that there exists a stationary chain compatible with such a 
family of transition probabilities? (ii) Is this stationary chain unique? (iii) What are 
the statistical properties of this chain? (iv) Does the chain exhibits a regeneration 
scheme, as in the renewal case? 

It is important to observe that the existing results of the literature on chains 
of infinite order cannot answer these questions which, at least to our view, are quite 



natural. The main reason for this is the fact that since the seminal papers of Onicescu 



and Mihoc (1935), the literature focussed on the so-called continuity assumption, 
which is not assumed here. In fact, the way we described the family of transition 
probabilities of our chains fits exactly in the notion of probabilistic context trees, 
introduced by IRissanen (1983). It follows that the good framework for our study is 
the one of probabilistic context trees and not that of continuous family. Moreover, 
there is, so far, no "well adapted" (in a sense we will make clear later in this paper) 
criteria for the existence and uniqueness of the stationary chain compatible with a 
given probabilistic context tree. 

As a consequence of this, the main method we used in order to answer the above 
questions is the constructive one: we give sufficient conditions on our parameters 
ensuring that we can perfectly simulate the chain from the stationary distribution. 
This is our first main result (Algorithms 1 and 2 and Theorem 5.1). As far as we 
know, the only perfect simulation algorithm for chains of infinite order, up to date, 



was the one of |Comets et al. ([2002J), which applies in the continuous framework. Our 
algorithm shares several features with their algorithm and this is only due to the fact 
that both algorithms use the coupling from the past method (CFTP in the sequel) 



As a byproduct of Theorem 5.1 



introduced by |Propp and W ilson (1996]) to perfectly simulate Markov chains. 



we have sufficient conditions for the existence 
and the uniqueness of the stationary chain (Corollary 5.1). We also show that this 
stationary chain has an hidden regeneration scheme, and that the expected size be- 



tween two consecutive regeneration times is finite (Corollary 5.2). The denomination 
hidden means that we cannot detect it on the realization of the chain. This regener- 
ation scheme arises from the perfect simulation algorithm, and is also similar to the 



one introduced by Comets et al.| ( 2002 1 . 



The last main result of this paper is the existence, under the same conditions, of a 



visible regeneration scheme (Theorem 5.2 ). This regeneration scheme can be detected 
directly on the realization of the chain. But since our chains are not necessarily 
renewal, detecting the regeneration scheme means, in general, knowing the entire 
future of the chain. 

We would like to emphasize the fact that the continuity assumption has been orig- 



inally introduced by Doeblin and Fortet (19371 as a technical assumption, enabling 
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them to obtain results (see discussion therein) . As they say, it is quite natural in this 
optic to assume that the probability transition from eC^ to a does not depend to 
much on the remote symbols of aZ^. However, continuity is one way to mathemati- 
cally translate this assumption. The present paper gives a different one. The success 
of the continuity assumption appeared three decades later (in the 60's), in part be- 
cause it is related with some well behaved dynamical systems and statistical mechanic 
models, through the Gibbs formalism (see for example Bowen (20081). However, from 
the application point of view, it is not clear that the real phenomena have to be rep- 
resented through the continuous framework. It seems to us quite natural (and also of 
mathematical interest) to explore the non-continuous world. 

Therefore, the interest of the present work is threefold. First, it extends the class 
of renewal chains to a class of stochastic chains having a visible regeneration scheme 
which is not a renewal scheme. Second, it gives an appropriate condition on the form 
of the context tree to guarantee the possibility to make a perfect simulation of the 
unique stationary chain compatible. Finally, it seems to be the first attempt of the 
literature of chains of infinite memory considering the non-continuous case. 

This paper is organized as follows. Section [2] gives the basic definitions and no- 
tation, introducing in particular the context tree framework. In Section [3] we give 
an example, which motivates the above discussion and explains the reason why we 
have been taken to consider such a class of stochastic chains. Section [4] explains more 
precisely our assumptions using the context tree framework. In section [5] we sketch 
the perfect simulation algorithm and state the results of this paper. Sections [6j [7] and 
[8] are dedicated to the proofs of the results. In Section [9j we present the complete 
perfect simulation algorithm, plus some simulations of the example of Section [3j We 
terminate the paper with some literature on the areas involved in this paper. 



2. Basic definitions 

Let A be a countable alphabet. Given two integers m < n, we denote by 
the string a m . . . a n of symbols in A. For any m < n, the length of the string is 
denoted by |a™ | and is defined by |a"J = n — m + 1. For any n € Z, we will use 
the convention that a™ +1 = 0, and naturally |a™ +1 | = 0. Given two strings v and 
v 1 , we denote by vv' the string of length + \v'\ obtained by concatenating the two 
strings. The concatenation of strings is also extended to the case in which v denotes a 
semi-infinite sequence, that is v = vZ^. If n is a positive integer and v a finite string 
of symbols in A, we denote by v n — vv . . . v the concatenation of n times the string 
v. We denote 

A -n = A {...,-a,-i} and i- (J j! • . 

which are, respectively, the set of all infinite strings of past symbols and the set of all 
finite strings of past symbols. The case j = corresponds to the empty string 0. 

2.1. Probabilistic context tree 

We say that a string s is a suffix (resp. prefix) of another string v if \s\ < \v\ and 

V -\s\ = S ( reS P- V -\v\ = S >- 

Definition 2.1. A subset t of A* U A~ N is a tree if no string s £ t is a suffix of 
another string v G t. This property is called the suffix property. When we have 
sup{|u| : v £ t} — +oo, we say that the tree r is unbounded. 
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Definition 2.2. A tree r is complete if any element aZ^ of A~^ has a suffix belong- 
ing to t. The suffix property implies that this suffix is unique. We call it the context 
of the sequence aZ^ and it is denoted by c T (aZ 00 )- A complete tree is called a context 
tree. 

We also extend the notion of context for finite strings: for any aj^ G A* , m < n, 
we put c T (a^) — v if v is a suffix of a 1 ^ belonging to r. If no context of t is suffix of 
a^, we use the convention e T (a™ ) = 0. In particular, c T (0) = 0. 

Definition 2.3. A probabilistic context tree on A is an ordered pair (r,p) such that 

1. t is a context tree; 

2. p — {p{-\v)\ v 6 t} is a family of transition probabilities over A. 



Examples of probabilistic context trees are shown in Figures I (a) (for the bounded 
case) and 1(b) | (for the unbounded case). 

We call the attention of the reader on the following notation: if v = v_\ v \, . . . , V-x 
is a context of a context tree r, then v-i, i = 1, . . . , \ v\ denotes the path from the 
root to the leaf in the tree representation of r. In the conditional probability p(a\v), 
we will swap the order of the symbols of the context v to keep the overall temporal 
order: 

p(a\v) =p(a|u_i . ..u_|„|) 

is the probability that at time n, say, we put symbol a given that at time n 
have V-i, at time n — 2 we have v_2--- 



1 we 



The context tree illustrated in Figure 1(a) is defined on {1,2,3}, and to each 



context v, we assign in the boxes the transition probabilities p(l|v), p(2\v), p(3\v). 
In this paper we only consider unbounded context trees, but we put this example to 
remember that the model has been introduced by Rissanen ( 1983| ) in the bounded 
case. The context tree illustrated in Figure 1(b) is defined on {1,2}, it corresponds 
to the discrete renewal chain, with renewal symbol 2. This means that the successive 
occurrences of 2 "split" the realization of the chain into i.i.d. block. For any i > 0, 
Pi denotes the transition probability p(2|l*2). The transition probability p(l|l l 2) is 

1 - Pi- 
More general examples of unbounded context trees (without specifying the tran- 
sition probabilities) are given by Figures [2l[3(ayiand[3(b)| 




131 231 331 
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Figure 1: Examples of probabilistic context trees. 
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Definition 2.4. We say that a symbol a of A is e-regular for a probabilistic context 
tree (r, p) if 

inf p(a\v) > e > 0. (1) 

A string w of A* is also said to be e-regular if all symbols of A appearing in w are 
e-regular. For any fixed e > 0, we denote by £ the set of elements of A which are 
e-regulars. 

2.2. Unbounded variable length memory chains 

Definition 2.5. We say that a stochastic chain X = (X n ) ne z of law P is compatible 
with a probabilistic context tree (r,p) if for P-a.e. past aZ]^ € v4~ N and any a £ A 
we have 

Ppfo = a\XZl = all) = p(a|c r (al^)). (2) 

These chains are called variable length memory chains. If r is unbounded, these 
chains are called unbounded variable length memory chains. 

It remains to translate what we called the "reference" string in this framework. 
This will be done in Section [4j but first, let us give an example. 



3. Discussion and examples 

3.1. Motivating the present work 

Let us explain why the existing results of the literature cannot help us in our 
study. First, let us show that our chains need not to be continuous. A family of 
transition probabilities P is continuous if 



> k := sup{|P( a |a:L) - P(a\bZl)\ :aeA, aZ^bzl € A^ , az\ = bz\} 

converges to zero when k diverges. It is enough to consider the probabilistic context 
tree on {1,2} illustrated in Figure 1(b) with the following probability transitions: 

Pi = ei.l{i is odd} + e2-l{i is even} and p^, := p(2|l + °°) = £3 

where e±, £2 and 63 are different real numbers in (0, 1). Then it is straightforward to 
check that 

Pk = sup{|ei - e 2 |, |e 2 - e 3 |, \e x - e 3 |} 

for any k > 0. It follows from this simple observation that none of the chains in the 
class we consider have to be continuous. 

The second point motivating the present work is that the perfect simulation algo- 
rithm given by Comets et al. (2002 1 is not well adapted to context trees. This follows 
from the fact that it does not use the information of the context tree. One more time, 
in order to illustrate this fact, let us consider the simplest possible case: (r,p) is such 
that 



• r is the context tree illustrated in Figure 1(b) 

• p satisfies the continuity condition of Comets et al. ( 2002 1 , and 

• symbol 2 is e-regular. 
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In this case, there exists a very simple procedure to construct a sample Xq of the 
chain compatible with (r, p) form the stationary measure. We use the fact that 2 is 
e-regular to couple the constructed chain with an i.i.d. sequence U of random variable 
uniformly distributed in [0, 1[ in such a way that we put = 2 whenever f/j < e. We 
generate backward in time Uq, t/_i, . . ., and stop the procedure at the first time — k 
such that [7_ fc < e. We write — k = 9[0, n], it is a regeneration time for [0, n], and we 
put X-k = 2. Then, we construct the sample recursively from time —k + 1 up to n: 
for any i > k + 1, we sample X{ from the distribution (p(-\Xi-i . . . Xk) — e)/(l — e). 
Observe that p(-|X;_i . ..Xk) is always well defined since Xk = 2. The constructed 
sample is stationary and compatible with (r, p). The way we defined #[0,ri] implies 
that it has a geometric distribution with parameter e. This procedure is well known 
in the perfect simulation literature, such a chain is said to be uniformly minorized by 



e (see for example Foss and Tweedie (1998)). The perfect simulation algorithm we 



present in this paper works for much more general chains, and is an extension of the 
procedure we just described. 



Now, suppose we perform the above algorithm and the one of Comets et al. ( 2002 1 
at the same time, using the same sequence U, to perfectly simulate a window Xg . Call 
9 CFF [0,n] the regeneration time obtained using their algorithm. The most objective 
way to compare both algorithms is to check which one is faster. Their random variable 
6 CFF [0,n] is defined by 

9 CFF [0, n] — max{fc < : ZTj < a^k, i = k, . . . , n}, 

where (cij)j>o is a [0, l]-valued sequence increasing to 1. The way it increases depends 
on the continuity assumption they make. Anyway, to compare here with the above 
algorithm, we assume that ao = e if only 2 is e-regular, and ao = 2e if both symbols 
are e-regular. If #[0,n] = — fc, it means that U—k < ao, and in this case, say, it puts 
X-k = 2 if U^k < e and X-k = 1 if e < U-k < 2e (if 1 is also e-regular). The only 
way their algorithm could be faster than the above algorithm would be that their 
regeneration time occurs before the first time U-k < 

9 CFF [Q, n] > max{fc < : Uj < e} =: p. 

Therefore, denoting by P the law of U 

¥(d G [0,n] < d CFF [0,n}) = P(p < 9 CFF [Q,n}). 

It is difficult to find a good upper bound for this probability in general. We just 
mention two simple cases. In the case where mf v£T p(l\v) = 0, it is clear that we have 
P(p < 9 CFF [0,n]) = 0. The other case we can study easily is when pi \ = e. 
A quick look to their algorithm shows that in this case, if p < 9 CFF [0,n], then the 
reconstructed sample is all 1: X 7 g l ^ Q , = il e [o^™]l+"+ 1 p Let us compute 



i>0 



P(p < 9 CFF [0, n]) = Y. P (P = -*> CFF [0, n] > -i) 

i,9 CFF [0M = ~l)- 

i>0 j=0 

' [0, n] ~ —j} depem 

is bounded above by 



EE 



i>0 j=0 

Since the event {9 CFF [0, n] — —j} depends only on {7° •, it follows that the later term 



i-1 

EE^ 1 ~ e ) w+lp (^-j = o n+j+i ). 

j>0 j=0 



G 



Using the fact that the symbol 2 is e-regular, the probability P(Jf™- = is 
bounded above by (1 — e) n+J+1 , it follows that for some constant C > 

P(0 G [O, n] < d CFF [0, n)) < C(l - e) rl . 

Which goes very fast to 0. These facts are a consequence of the following general 
remark. Since their algorithm does not use the form of the context tree, it leads to 
regeneration times that have, a priori, nothing to do with the natural regeneration 
times one could expect: the successive occurrences of 2 along the realization of the 
chain. This leads to another misleading situation: their regeneration times cannot be 
seen on the realization of the chain. 

3.2. Example 

As we said, the symbol 2 is renewal for the chain compatible with the context 
tree of Figure l(b)| Therefore, an example of extension of this model is the following: 



"if the last occurrence of 2 occurred at a distance i in the past (that is, if the last i 
symbols are all l's), then look back i sites further this occurrence". In other words, 
the contexts have the form 

a!*T 2 1 2 l 4 , Vi > and aZ^-i e A *> 

the context tree is 

r = U ! > U cei4 .c21 ! (3) 
and is represented on in Figure [2] 



Distance to the 
reference strin; 



Occurrence of the 
reference strin,' 



How much fur- 
ther the reference 
string we need to 
look at. 




22211 



Figure 2: The upper part of the context tree r defined by We only specify some of the contexts. 



Our results say that assuming inf„ 6T p(2|v) > e > 0, 



we can perfectly simulate 
The perfect 



the unique stationary chain X compatible with (r,p) (Theorem 5.1). 
simulation algorithm extends the algorithm we described rapidly in Subsection |3.1| 
for the renewal chain. The main difference is in the definition of the regeneration 
time. In Section [9j we will give an explicit perfect simulation of this chain. We 
also show in this work that almost surely, infinitely many occurrences of 2 split the 



realization of X into independent and identically distributed strings (Theorem 5.2 1 



However, since any occurrence of 2 can be bypassed by a future context, with positive 
probability, this "regeneration scheme" differs substantially of the "renewal scheme" . 
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4. On the form of our context trees 



The examples of the preceding Section gave us an idea of how our families can be 
described in terms of probabilistic context trees. The aim of the following definitions 
is to define, using the probabilistic context tree framework, what we called "refer- 
ence string". At the end of this section, we give several examples explaining these 
definitions. 

Suppose we are given an unbounded context tree r. For any finite string w of A* 
we define the function m w which associates to any context v G r the integer number 

m w {v) := inf{j : < j < \v\ - \w\, such that vZ]l] w] = w} , (4) 

with the convention that m w (v) = +oo if the set of indexes is empty. In the context 
tree, m w (v) is the distance between the root and the first occurrence of w in the 
context v. If a context v is such that m w (v) — fc, then it can be written as the 
concatenation 

V = . . . V_ k _\ w \_ X W V-k ■■■V-l, 

where v_j + ^ 1 =/= w for j = \w\, . . . , k + \w\ — 1. The context trees considered in the 
present work have the following form. There exist a finite string w of A* and, related 
to this string, a function £ w : N — > N satisfying £ w (k) < +oo for any k > 0, such that 
for any v € r 

| v | =m w (v) + \w\+r(m w {v)). 

The string w is the reference string for the context tree r. The function £ w tells us 
"how much further" the last occurrence of w we need to look back. It is precisely the 
notion of reference string which generalizes the notion of renewal string. 

Let us give some examples for the reader to see how the notion of reference string 
appears on the shape of the context trees. 



4-1- Example of Figure 1(b) 



The reference string is the symbol 2. The function t 2 is identically 0, traducing 
the fact that 2 is a renewal symbol. 



4-2. Example of Figures^ and 3(a)\ 



The reference string is also the symbol 2 in both figures. In figure [2j £ 2 is the 
identity as we say in Section [3] Comparing Figures 1 (b) and [2] let clear the fact that 



the notion of reference string generalizes the notion of renewal symbol. 

To simplify Figure 3(a)| we made small triangles for subtrees. These subtrees are 



context trees of finite height since 2 is a reference string. Suppose the context tree 
illustrated in Figures 3(a) is such that l w (k) = 1 + k 2 . It follows that any context 



of the context tree of TFi gU ro[2] is a suffix of a context of T Figurc |3(a)j In this case, we 
write T Figurc [2]< T Figuro [3g)j and observe that 

r Fi g urc [T(b)1 < T F i g uro[2] < Tp^urc [3(1)] 



There is a difference between r Figul . c [3^jj and the two others: the reference string 2 is 
not a context. 

4-3. Example of Figure \3(b )\ 

The reference string is 12 and is not a context. We can see 5 infinite size contexts 
but there is infinitely many of them. 
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e 2 (i) = 2 



112 212 



£ 2 (2) = 5 




Figure 3: The upper part of two unbounded context trees, the one of (a) has as reference string the 
symbol 2, the one of (b) has the string 12. 



5. Perfect simulation algorithm and Statement of the results 

Consider an unbounded probabilistic context tree (r, p). We recall that £ is the 
set of elements of A which are e-regulars for some e > 0. We will assume without loss 
of generality that A = {1, 2, . . .} and £ = {1, 2, . . . , #£} (#£ denotes the cardinality 
of £). We introduce a partition of [0, 1[, illustrated in Figure [ij which will be used 
for the construction of the chain. Define for any a € £ and any v £ r the intervals 



J(a|0) = [(a- l)e,ae[ , J(a\v) 
and 



#£e + £>(t|«) - e), #£e + ^(p(i\v) - e) 

i=l i=l 

K(a\v) = J(a|0) U J(a\v). 
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For any a G A \ £ = {#£ + + 2, . . .} 



J(a\v) = K (a\v) 



J2p(i\ v ),J2p(i\v) 



Observe that for any v G r 

J(1|0), . . . , J(#£ |0), J(l\v), . . . , J(#£|«), + l\v), . . . 

defines a partition of [0, 1[ (see Figure [4]), and that for any a £ A and any v G r 

A(A-(a|«))=p(o|t;) (5) 

where A denotes the Lebesgue measure on [0, 1[. 

Let U = (U n )nez be a sequence of i.i.d. random variables uniformly distributed 
in [0, 1[ and defined in some probability space (SI, J 7 , P). All the chains considered in 
what follows will be constructed using this sequence U. 

A(J(l|«))=p(l|«)-e 

\(J(#S\v))=p(#8\v)-e 



\ 


A(J(# 


S + l\v))=p(#£ 
X(J(n\v)) 


+ l\v) 
= p(n\v) 

1 


#£e \ 

i i \ 




l-#£e 





e 

hr 


e ■ e 




iii i ii 




2 ' 3 ' 




1 • 


• #£ '#£ + i ' • 


n 



[o,i[ 



Fi gure 4: Illustration of the partition of [0, 1 [ with the disjoint intervals {t/(tt|0)}- a ££ and { t /(tt|i , )} ^ u 4 
for some v 6 r. 



We construct a deterministic measurable function X : [0, l[ z — > A 1 such that the 
law P(X(U) G •) is compatible with (r,p). The construction of this function is carried 
out in such a way that for any n£Z, [X(U)] n = a whenever U n £ J(a|0), for any 
a £ £. Suppose that for some time index n £ Z there exists a string a~ k £ A k such 
that U n ~i £ J(a_j|0), i = 1, . . . , k, in this case, we put 

[*(u)K:* = 

This is a sample that has been spontaneously constructed. We have three situations at 
this point: (i) U n belongs to [0, #£e[, (ii) U n belongs to [#f e, 1[ and c T (aI^,) = v £ r, 
and (iii) J7 n belongs to [#fe, 1[ and c T (a~\) = 0- In situation (iii), we are not able 
to construct [Jf(U)]„ knowing only [X(U)]"l],, and therefore, we need to determine 
more past symbols. In situations (i) and (ii), we can construct [X(U)]„ independently 
of [X(U)Y5£- 1 : we put for any a £ A 

[X(XJ)] n = a if U n G K(a\v). 
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Observe that [X(U)]"_ fe has been constructed independently of t/"^ fe_1 and tZ+j^. 
Suppose we want to sample the value of the stationary chain at time 0. The idea of 
the algorithm is to generate the t/,'s backward in time until the first time k < in 
the past such that we can carry out the above construction from k to 0, without using 
fZ^" 1 and Uf°°. This is a CFTP algorithm. 

The size of the suffix of the past we need to know to construct the next symbol 
depends here on the previously constructed past itself (excepted for the time indexes 
where the symbol is spontaneously constructed). In the CFTP algorithm introduced 



by Comets et al. (2002), the size of this suffix of the past is defined by an i.i.d. 
random variable, totally independent of values assumed by the chain. This is the 
main difference between both works on the technical point of view, and this makes 
our perfect simulation algorithm a little bit more complicated. 

The three cases enumerated above by (i), (ii) and (iii) are formally described by 
the measurable function F : [0, l[x(A* U A~ N ) — > A U {*} defined as follows: for any 

a™ € A* U A~ N , -oo < m < n + 1, 

F(u, <C) = «•*{« € K(a\c T (a n m ))} + € [#£e, 1[, c^a™ ) = 0}, (6) 

with the conventions that a™ +1 = and c T (0) = 0. When c T (a^ n ) 7^ 0, we have 

¥{F{U n+1 , <C) = a)= ¥(U n+1 E K{a\c T {a n m ))) = \{K{a\c T (a n m ))) = p{a\c T {a n m )). 

(7) 

This function is an update function. Since we consider chains of infinite order, this 
update function may return the symbol *, meaning that we have not a sufficient 
knowledge of the past to continue the construction. 

We define for any m < n the ^"(C/™ )-measurable function C : [0, ij™-"^ 1 — > {o, 1} 
which takes value 1 if and only if we can construct [X(U)]^ independently of U™^ 1 
and U^+i using the construction described above. Formally 

n 

{£([/») = 1} := |J p| {F(U h C 1 ) = a t }. 

Finally, we define for any m < n < +00 

6[m, n] := max{/c < m : £(UJ?) = 1} (8) 

with the convention that 9[m] := 6[m,m\. This time is called regeneration time for 
the window [m, n], and is the first time before m such that the construction described 
above is successful until time n. 

We now state our main results and present a "simplistic" perfect simulation al- 
gorithm (Algorithm 1) for the construction of a sample [X(U)]^. A more "realistic" 
one (Algorithm 2) is given in Section [9] together with an explicit perfect simulation 
in the particular case of Section [3] 

Theorem 5.1. (Perfect simulation). Consider a probabilistic context tree (r,p) hav- 
ing an e-regular reference string w. If 

hmsupi^M<l, a e :=-J-log(l- e H )>0 (9) 
fc^oo C t k \w\ 

then Algorithms 1 and 2 stop almost surely after a finite number of steps, i.e., we 
have for any —00 < m < n < +00 

to, n] > —00) = 1. (10) 



11 



Algorithm 1 "Simplistic" perfect simulation algorithm of the sample [X(U)] 
1: Input: m, n; Output: 9[m,n], ([X(U)] g[nhn] , . . . , [X(U)]„) 
2: Sample U m , . . . , U n uniformly in [0, 1[ 

3: i<-m, 9[m,n] <- m, [X(U)]« <- * 7l - rn+1 , £([/") <- 
4: while £([/") ^ 1 do 

5: i<-i — l 

Choose Ui uniformly in [0, 1[ 
6: end while 
7: 9[m, n] i 
8: while [X(U)]„ = * do 
9: [A-(U)]i <- F(Ui, [X(U)]^ n] ) 

j «— i + 1 
10: end while 

11: return 0[m,ra], {[X(XJ)]g [m>n] , . . . , [X(U)]„) 



In the rest of the paper, we will often write for [X(U)]j (and X for X(XJ)) in 
order to avoid overloaded notations, keeping in mind the fact that for any i, Xi is con- 
structed as a deterministic function of U. Actually, by Theorem |5.1| Xi depends only 
on a P-a.s. finite part of this sequence: Xi := [X(. ■ ■ , itgm-i, U$\{i ,...,Ui, tt-j+i, ■••)]* 
for any u <E [0, l[ z . 

Corollary 5.1. (Existence and uniqueness). The output of Algorithms 1 and 2 are 
samples of the unique stationary chain compatible with (t, p). We will call \x the 
stationary measure o/X: 

fi := P(X(U) G •). 

Note that the e-regularity assumption for the symbols appearing in the context 
w is weaker than the regularity (also called strongly non-nullness) assumption of the 
literature. And we know that this later condition neither implies existence nor unique- 



ness of the stationary measure (see for example Bramson and Kalikow 1993). Our 



e-regularity conditions may rather be compared to the weakly non-nulness assumption 



which requires J2 a eA ml «eT p{ a \ v ) > and which is assumed for example by Comets 



et al. (2002). In fact, this condition is very useful for our construction point of view 
this allows us to have symbols which appear spontaneously, and makes the CFTP 
easier to perform. 

The proof of Corollary |5.1| using the CFTP algorithm and Theorem |5.1| can be 
found in Comets et al.| ( |2002[ ) (Proposition 6.1 for the existence statement and Corol 



lary 4.1 for the uniqueness statement). We omit these proofs in the present work 
in order to save space, but we mention the main lines. The existence statement fol- 
lows once we observe that Theorem |5.1| implies that one can construct a bi-infmitc 
sequence X verifying for any n € Z, X n — F(U n ,X™^). By this chain is there- 
fore compatible in the sense of It is stationary by construction. The uniqueness 
statement follows from the loss of memory the chain inherits because of the existence 
of almost surely finite regeneration times. 

We call time t a regeneration time for the chain X if 9[t, +oo] = t. Define the 
chain £ on {0,1} by £j := 1 {j = 9[j, +oo]}. Then, consider the sequence of time 
indexes T defined such that £j = 1 if and only if j = 7] for some I in Z, TJ < T;+i 
and with the convention To < < T\. We say that X has a regeneration scheme if 
the chain £ is renewal (that is, if the increments (Tj+i — Ti)igz are independent, and 
are identically distributed for i ^ 0). 
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Corollary 5.2. (Regeneration scheme). In the conditions of Theorem 5.1 the chain 
X has a regeneration scheme. The random strings ([X(U)]r i , . . . , [X (\J)]T i+1 -i)i^o 
are i.i.d. and have finite expected size. 

In words, this corollary states that the unique stationary chain compatible with 
(r,p) under the conditions of Theorem 5.1 can be viewed as an i.i.d. concatenation 
of strings of symbols of A having finite expected size. A similar result has been 
first obtained by Lalley (1986) for one dimensional Gibbs states under appropriate 



conditions on the continuity rate, and then by Comets et al. (2002) under weaker 



conditions than the ones of Lalley ( 1986 ). It is an hidden regeneration scheme, because 
it uses the sequence U. The main reason why we give this result is that it arises 
naturally from our perfect simulation approach. 

The visible regeneration scheme involves several technical complications, even if 
in spirit, it is similar to the preceding one. We postpone the precise definitions to 
Section [8j and give the following simplified statement. 

Theorem 5.2. (Visible regeneration scheme). Suppose (r,p) satisfies the conditions 



of Theorem 5.1 Then, for fi-a.s. realization of the chain X compatible with (r,p) 



there exists a sequence of random times T x such that 



for any i 6 Z, the event {T x 
generated by Xt°° and 



k} is measurable with respect to the a-algebra 



conditionally on T, the strings (A T x, 
expected size. 



,X T x i „ 1 )i^o are i.i.d. and have finite 



Observation 5.1 (Monotonicity). Suppose the probabilistic context tree (r, p) satis- 
fies the conditions of the above results, then, all the above results hold true for any 
probabilistic context tree (t' ,p') such that r' < r and for which w is e-regular. 



6. Proof of Theorem 15.1 



A slight complication arises from the fact that the random variable 9 depends on 
the values assumed by the chain X along its construction. In the first step of the 
proof, Subsections 6.1 6.2 6.3 and |6.4| we define another random variable, we will 



denote 9 (see (18)) and which has the following properties: (i) it only depends on the 
spontaneous occurrences of w along the construction, (ii) it can be used to define a 
lower bound for 9. 

In a second step, Subsections 6.5 and 6.6 



we relate the distribution of 9 with 
the probability of return to the state for an N- valued auxiliary process which also 
depends on the spontaneous occurrences of w. At this point, there is a clear similarity 



with the proof of Comets et al. ( 2002 ) , the principal difference being that our auxiliary 



process is not the house of card process, but is defined through ( 23 ) 
The proof of Theorem |5.1| is finally given in Subsection |6.7| 



6.1. Simplification of the problem 

Suppose we are given a probabilistic context tree (r, p) having an e-regular refer- 
ence string w = ^Ci^i to which corresponds the function i w . Owning to Observation 
5.1 there is no loss of generality in restricting the proof to the case where (i) only the 
branches having w as subsequence have finite length and (ii) £ w increases and goes 
to infinity. Observe that this is for example the case of the context tree illustrated in 
Figure |3fb)1 

We define a new stochastic chain Z: for any i e Z, Z, = a if Ui belongs to J(a|0), 
and Zi = -k otherwise. This chain takes in account only the symbols which appear 
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spontaneously in X: Xi = a whenever Z^ = a, and in particular X^_, w < +1 = w 
whenever Z l , , , , = w, for any i e Z. We also define the N-valued random variables 

•i — | w I + 1 1 J 

mi(XJ) = rrii and L;(U) = Li as follows: 

= inf \ k > : Zi~ k - X 



m, : = inli^U:V HH = w. 



'{> 

which is the distance to the last occurrence of w in Z l zl~,, and 



L= { if 0i €[(),#£[, (u) 

1 \ mi + \w\ + l w (mi) otherwise. ^ ' 

The reason why we introduced these random variable is that if we have £(f7 r ™ ) = 1 
for — co < m < n < +co and L n+ i < n — m + 1, then L n+ i is an upper bound for 
the number of sites in the past we need to know in order to decide the state at time 
n + 1 using the perfect simulation algorithm. To see that, suppose that for some 
— oo < m < n < +co we have £(11^) = 1 and that < L n+1 < n — m + 1. Since 

it follows that the distance to the last occurrence of w in Z 1 ^ is larger than in 
(recall the definition Q of m w ): 

m n+1 >m w (c T (X^)). 

We also recall that for any v G r in which the string w appears, we have 

|v| = m w {v) + \w\ + r {m w [v)), (12) 



therefore, by definition (11 1, whenever L„ +1 > 0, L n+ i is an upper bound for the size 
of the context needed at time n + 1 : 

\ct{X^)\ < L n+1 . 

Observe also that L n+ i = if and only if the symbol appears spontaneously at time 
n + 1. 

6.2. Example of Figure^ 

In this context tree, the symbol 2 is the reference string and suppose that it is the 
only e-regular symbol. In this particular case, \w\ = 1 and Z turns out to be an i.i.d. 
chain taking value 2 with probability e, and 7k- with probability 1 — e. Let us consider 
the random variable 

<9 H=1 [0,n] := max{j < : L { < i-j, i=j,...,n}. (13) 

The best way to understand the utility of this new random variable is to exlpain that 
for any n > 0, when we are in the set {U : 6 I ''"' =1 [0, n] > -co} we have 

H=1 [O,n] < 6[0,n\. (14) 

To simplify the notation let us write 9 1 := 9' w \ =1 {0,n}. To each time i € {0 1 , . . . ,n}, 
we associate an arrow going from time i to time i—L{. The definition of 1 says that no 
arrows starting from {6 , . . . , n} go beyond time 6 . This means that the construction 
of Xg! can be performed recursively from time 6 1 to time n using only Ugi, and 
therefore that £(C/^\) = 1. Since 0[O, n] is the maximum over {k < : £{U£) — 1}, it 
follows that ((T4| holds. 
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6.3. Simplification of the problem (continued) 

To keep the same form as 0l'"l =1 [O,n] for the general case where \w\ > 1, we 
introduce the time rescaled chain Z defined by 



1 



★ otherwise, 



if U m \ w \_ i+1 € J(w-i\0), i = 0,...,\w\ - 1 



and the function 



*(*) := 



^((t + ijH-i) 



(15) 



(16) 



where for any r € R, |Y] denotes the smallest integer number greater than or equal 
to r. Using these new definitions, we introduce the rescaled random variables 



in. 



inf-ffc^O:^-! =1} 



which is the distance to the last occurrence of 1 in Z l _^ and 
Li -- 



if Zi = 1, 

fhi + 1 + £(fhi) otherwise. 



(17) 



We are now able to define our new random time: 

(?[0, n] := max { j < : L, < i - j, i = j,...,n}. (18) 
Observe that in the case where \w\ = 1, this definition is equivalent to definition (11 31). 



***ac*ac 



*cbc*aaa*a Z 
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IS 



24 



30 3 
ij = 5 



me = 3 



36 time 



6 time 



I * * * * Z 

Figure 5: Illustration of two coupled samples and Zf in a case where the reference string w has 
size 6, and £ = {a, b , c}. We have mj = 5, my = 2 and fhe, = 3 and 1713 = 0. This Figure illustrates 
the inequalities l |19[ l and l |20[ l. 

Lemma 6.1. For any n > 0, we have in the set {U : 9[0, n] > —oo} 

0[O,n\w\] > \w\(e[0,n] - 1) + 1. 
We refer to Subsection |6.4| for an example illustrating this Lemma. 
Proof. The way we defined m« and fhi implies that for j £ {(i — l)\w\ + 1, . . . , 

mj < (fhi + l)\w\ - 1 - (i\w\ - j) (19) 

<(fhi + l)\w\-l. (20) 

We refer to Figure [5] for a pictorial illustration of these inequalities. It follows that, 
whenever L,; > 0, we have the following sequence of inequalities: for any j in the set 
of sites {(i — l)\w\ + 1, . . . ,i\w\} (recall that l w is increasing) 



L, 



mj + \w\ +£ w (m,j) 



< mj + \w\ + i w ((fhi + l)\w\ - 1) by inequality Q 

< mj + \w\ + |w|^(mj) by definition (16) 
\w\ — 1 — (i\w\ — j) + \w\Li by inequality) 19 ). 



(21) 



< 



15 



This indicates that, whenever Li > 0, the arrow starting at time j goes, at most, 
until time |iu| — 1 — (i\w\ — j) + \w\Li. In the case where Li = 0, we have Lj = 



for j £ {(i — l)\w\ + 1, . . . ,«M}- Denote = #[0,ro], the last line of (21 1 yields the 
following inclusion 

n n i\w\ 

f]{Li<i-S}cf] f) {Lj <j - \w\{8 - 1) - 1}, 
i=e i=e j={i-i)\w\+i 

meaning that in the sequence Z, none of the arrows starting from the set of sites 
{\w\(0 - 1) + 1, . . . ,n\w\} will pass time \w\(6 - 1) + 1. 

□ 



6.4- Example illustrating Lemma \6.1\ 

The lower part of Figure [6] illustrates Lemma |6.1| in a case where our reference 
string w has size \w\ = 3, and £ — {a,b,c}. We represented two coupled samples 
Z^gg and Z_i2- The subjacent sample £/^ 38 with which both sequences are coupled 
is not represented here. We use the function £ w having the following values: 

r (o) = o, r{i) = o, r(2) = 2, r(3) = 2, r(4) = 3, 
r(5) = 4, r(6) = 7, r(7) = 8, r(8) = 12. 

The function £ has the following values, calculated with £(i) 

1(0) = 1, £{1) = 2, £{2) = 4. (22) 



r {(i+i)\w\-i) 

\w\ 



The arrows starting at time i represent Li in the upper sequence, and Li in the lower 
one. The rectangle delimits the sample we want to construct. This example shows 
that we can use the sequence Z to define the lower bound (^[0,2] — l)|w| + 1 = 38 
for #[0, 6]. However, this lower bound is quite large, since we observe that the perfect 
simulation can be done from time —5 instead of time —38! Even more, it is possible 
that the perfect simulation can be performed from time —1, but we cannot check this 
here, because we need the information of the context tree and of U ( L 1 . 

6.5. An auxiliary process to study 9[§, n] 

For any n € Z, let us consider the N-valued stochastic chain D^ 1 ^ defined by 
Z)| = for any i < n and 

= (i- - li) V , Vt > n + 1, (23) 
where i^ := m&x{l < i : — 0}. We refer to Figure [g] for an illustration of this 



process in the case of the example of Subsection 6.4 In this figure, we illustrated the 



(i) (i) ~o 

samples -D_ 13 , . . . ,D\' for i — —8, —11, —13, related to the sample -Zl 12 . 

As we informally mentioned in the introduction of the present Section, we intro- 
duce this new auxiliary stochastic chain in order to study the distribution of 9. The 
relation between 0[O,n] and D^ ' is made clear by the next Lemma. Let us mention 



that D(") has the same role as the house of cards process W introduced in |Comets| 



et al. (2002 Section 5) 



Lemma 6.2. For any I > and n > 



l+n+l 

P(0[O,n] < -/) < ¥< y D k ) = °)- 

k=l+l 



16 



o D(~ 13 ) 
• D<- n ) 
x d<- 8 ' 




Figure 6: Illustration of the random variable 9[0, 2} constructed using Z, and of the behavior of the 
chains D« for i = -13, -11,-8. 
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Proof. It follows directly from the above definition and definition ( 18 1 of that 

-l 



{9[o,n]<-i}= n u H i) =o}. 



i=— I— I k=i+l 

We also observe that for any two integers n\ < n 2 



(24) 



(25) 



Moreover, if -D^" = for some i > n, then DJ. = Dj for all j > i. These two facts 
imply that the sequence of chains (D^™^)„ e z verifies 



D (n) = 



D 



D 



(m) 



Vn < m < i < k. 



(26) 



This is the coalescence property of the sequence of chains (D^)nez illustrated on 
Figure [6] Using first (26 1, and then (25), we obtain that for any I > 



n u h°=<>}= n U{^ ) =o}=u^i 



c-i-i) 



0}. 



fc=i+l 



fc=0 



It follows from ( 24 1 that for any I > and n > 



{^[0,n]<-/}=|j{^ 1) =0} 



(27) 



fe=0 

Therefore, using the translation invariance of Z, we obtain 

n l+n+i 



F(6[0,n]<-l)<J2 



(-i-i) 

k 



= o)= nD[ 0) = o) 



k=0 



k=l + l 



6.6. Study ofP{D { ° ] = 0) 

Define the inverse function of I by 

£-\i) = inf{£; > 1 : £(k) > i} , Vz > 0. 

The aim of this subsection is to prove the following Proposition. 



□ 



(28) 



Proposition 6.1. Let D' ) be the chain defined through (23) using an N -valued func- 
tion I and the i.i.d. chain Z on {1,*} with distribution {^ w \l — e' w '). Then, the 
sequence := ¥(D^ = 0) 

1. is summable when (1 — e' w ')^ (0 is summable, 

2. decreases exponentially fast when (1 — e' 1 "')^ W decreases exponentially fast. 

Proof. Here we use the proof given in ( |Bressaud et al. 1999 Proposition 2) for the 
house of cards process. Denote by £ the first time larger than such that D( ) touches 
0, and by the probability P(£ = k). First of all, we observe that the state is a 
renewal state for the chain D^ ^. It follows that the sequences {uk)k>i an d (fk)k>i 
satisfy 

k 

Uk = y^/jttfc-j. (29) 
»=1 
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By (29 1, the series 



F(s) := J2 fns n and U(s) := £ M „ S " 



are related through 



U(s) 



(30) 



1 - F(s) 

for s > 1 such that F(s) < 1 (see for example Feller| 1968 chap. XIII. 10, Theorem 1). 
In order to prove statement (1), all we need to prove is that the state is transient, 
that is F(l) < 1, whenever (1 — e' w 'Y W is summable. Suppose that for some M > 
we have Z^ 1 — l M so that in particular — M. The first possible arrow which 
can go until or further time could be the one of M + £^ 1 (M — 1). This follows 
from the definition (281 of £^ 1 (M — 1). Then, in order that the chain D^™* 1 touches 
at the first possible time after M, it is necessary that 1~ 1 {M — 1) stars appear in 
Z from time M + 1 to time M + l^ 1 (M ~ 1). This is made clear by Figure [6] More 
specifically, we have for any integer M > 1 



\J{t = i}n{z™ = i M }= \J{z£i 



}n{zf=i M }. (31) 



i>l i>M 

It follows that using the partition 

\J{C = *} = \J{C = i] n {£f = i M } u |J{C - 1] n {z? + i M }, VM > l 

i>l i>l i>l 

one obtains the following simple upper bound: 

p( |J{C = i}) < p( (J {^r 1(i_1) = n {zf = i M } 



z? + 1 



a/ 



The events Ui>M{^i+i = ^ and {Zf 1 — 1 M } are independents since 

the Zi's are i.i.d. Therefore, for any M > 1 

< e M\w\ (i _ e M)* -1 (0 + i _ e M|H. 

»>Af-l 



If 5D i>0 (l — e'™!)'* W < +oo we can take M sufficiently large to ensure that 



Ei>M-i(! - e |w| r W < 1- Thus E»> (1 " e H )'~ w < +^ implies £,>i /, < 1, 
concluding the proof of statement (1). 

For the proof of statement (2), let us suppose that (1 - e^Y W decreases ex- 
ponentially fast. Then on the one hand D^ ' is transient and therefore F and U are 
related through (30), and on the other hand £ _1 (i) ~ ^ follows that 



and 



fn 
n— >-\-oo 



(0) 



> 0, i = 1,. 



n 
2 



l) e H(i_ e H)"/2 ) 



)"/2 



lim 

n— » +oo 



-co) > 0. Thus, the radius of convergence of F is 



-l/n 



f±_ e \w\\n/2\ = H_ e l«'K-l/2 



which is strictly larger than 1. Since -F(l) = P(C < +oo) < 1, it follows that by 
continuity, there exists a real number sq > 1 such that F(sq) — 1. By (30), this 
means that U(s) < +oo for s < Sq, and by the definition of U, it implies that u n 
decreases faster than r" for r e (so , 1). □ 
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6. 7. Proof of the theorem 

The proof of the theorem is simple now. We will show that for any — oo < n < +00, 
0, n] < —I) converges to when I diverges. Since 



P(0[O,n] < -I) < 





o,H 


n 






1 


(• 






< -M 






\w\ 






LhJ 



where [^J denotes the integer part of r, one obtains by Lemmas 6.1 and 6.2 

P(0[O,»] < -/) < J2 Uk ( 32 ) 

KlirJ 

for any Z such that j^jyj > 1. In the conditions of Theorem js . 1 1 we have 

logr(fc) 



lim sup — 
fc^oo log (1/(1 - e M)) 



k/\w\ 



< 1, 



therefore, there exists a real number a > such that for any fe sufficiently large 



fe/[|tu|(l+«)] 



It follows that < 



ciently large and by the definition ( 28 1 of I 1 , I 1 (n) > 



log - 



(fc+l)/(l+a) 

for k suffi- 
I — log n — 1 for any n 



sufficiently large. Therefore there exists n* such that 



£(1 _ eM)*" 1 ^) < 2(1- e l»l)^ l (n) + (1 _ e l™l)-i n-'- a , 

n>0 n— n>n* 

which is finite since a is strictly positive. By Proposition |6.1| this implies that 
(ufc)fe>i is summable. It follows by (32) that P(0[O] < —I) is summablc in I, and 
that P(0[O, n] < —I) goes to zero when I diverges, for any n, < n < +00. This 
concludes the proof of Theorem |5.1| 



7. Proof of Corollary [53 



of Corollary |5.2 



We refer to (Comets et al. 2002 Section 8) for a complete proof of the statements 



The proofs given therein go mainly along the following lines. 



7.1. Existence of a regeneration scheme 

On the one hand, we have that P(0[O, +00] = 0) > 0, which follows from the fact 
that 

P(0[O,+oo] = 0) > vin^iD^ > 1}), 

which is strictly positive, since the state is transient in the conditions of Theorem 
5.1 (this is shown in the proof of Proposition |6.1[ ). On the other hand, we have to 
check that the chain £, defined by £j := 1 {j = 8[j, +00]}, is renewal. This follows 
from the fact that by the definition pj) of the random variable 9, we have 



p| {9[t h +00} =U} = f] {6[t h t l+1 - 1] = ti} 



1=1 



1=1 



(where we used the convention t n+ \ = +00) which is an intersection of independent 

events, since {9[ti,ti + i — 1] = t{\ is T (u^ 1 ^ -measurable, for I = l,...,rt. To 

conclude, the fact that the random strings ([X(U)]t;, • ■ ■ , [-^(U)]r i+1 -i)i^o are i-i.d. 
follows from the construction using Algorithm 1 or 2. 
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7.2. Finite expected size 

To show that the expected size between two consecutive l's in £ is finite, we 
observe that by stationarity and definition ([8| of the random variable 6 

P(T i+1 -Ti>m) = P(0[1, +00] < -m\8[0, +00] = 0) = P(0[O] < -m + 1), 

which has been proved to be summable in Subsection |6.7| 

8. Proof of Theorem 



Suppose (r,p) satisfies the conditions of Theorem 5.1 Denote by X the unique 
stationary chain which has been constructed with Algorithm 1 or 2. Define the random 
variable Lf :— |c T (Xi"^)| and denote a :— [" - ] + 1- For any integers to, n 

such that — 00 < to + a\w\ < n < +00 the visible regeneration time of the window 
[to, n] is 

x [m,n] := max{fc < t : ^ +CT|t ° hl = w a and Lf < i-k, i = k+a\w\, . . . ,n}. (33) 

Observe that although Lf := \c T (X t _T c ^)\, the event {9 x [m, n] = k} is measurable 
with respect to the c-algebra generated by X%. To ask for x^ +a ' w ' 1 = w <? ensures 
that there exist realizations of X such that 6> x [m, n] > —00. To see that, observe that 
we can concatenate one more w to these a consecutive w's without needing to know 
more than w a : for i = 0, . . . , \w\ — 2 

\c T (w"- 1 w)\<\c T (w rT ~ 1 ww^ wl ...w^ H+l )\ = £ w (i + l) + \w\+i + l 

< £ w (\w\-l) + \w\+i + l 

< a\w\+i + l. 

(34) 

We say that time t is a visible regeneration time for the chain X if x [t + 00] = t. 
Finally, define £ x and T x using x [i, +00] in the same way we defined £ and T using 
6[t, +00]. What we want to show is that (i) 6> x [0, +00] is almost surely finite and (ii) 
£ x is renewal, with finite expected distance between two consecutive l's. 

We use the sequence Z which tells us where we are sure that w occurs in X. The 



proof of item (i) is quite similar to the proof of Theorem 5.1 The main difference is 
in the following new definitions: 

L\ :=m l + \w\+r(m l ), 

which is always strictly larger than 0, and for any n > <r|u;| 

e'[0,n] := max{fc < : z£'+ CT H-i _ w <x and ^/ < •_ fe for ■ = k + a \w\, . . . ,n}. 

(35) 

We can use the proofs given in Section to show that 9'[0, n] < 6* x [0,n], that 
'[0, n] < —I) goes to zero as I diverges, for any n < +00 and that P(0'[O] < —I) is 



summable in I. In order to prove item (ii), we can adapt the proof of Comets et al. 



(2002). Let us denote v :— w a and define for any —00 < m < n < +00 the events 



M-i 

h[m,n]:= f] {\c T {X^w^ w \ . . .w_\ w \ +i )\ < n + i + 1 - m}, 

i=0 

which says that one more w can be concatenate to without needing to look back 
before time to, and 



H 



[m 7 n] := {x™ +CT l u, l- 1 = v, \c T (X^)\ <i-m, i<E {to + <r\w\, . . . , n}} n h[r 
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Both of them are measurable with respect to the cr-algebra generated by . Finally, 
define 

H[m,+oo] := {x™+ CT H-i =«, \c T (X l m )\ < i - m , i>m + a\w\\ 

which is measurable with respect to the cr-algebra generated by A+°°. By definition 
(33) of # x [m,n], — oo < m < n < +00, we have that if 

ti + a\w\ <t 2 + a\w\ < ...< i„_i + <x\w\ < t n , (36) 

then 

n n 

f| {9*^,+^}=^} = f]H[ti,U +1 -l] (37) 

1=1 1=1 
where i n +i := +00. This is an intersection of independent events. Then, we observe 
that by stationarity, 

V(H[j, +oo])=P(ff[0,+oo]) 

and 

F(H[-j,-l\) = P(£f[-j,+oo]|£r[0,+oo]) , Mj > a\w\. 



Together with ( 37 ) , this yields for any sequence of integers t\,...,t n verifying ( 36 ) 

n-l 

P(£ = 1, / = 1, . . . ,n) = P(£ X = 1) II p (£Wo-t, = ^0 = 1) 



1=1 



and therefore, the chain £ x is renewal, provi ng t he existence of the visible regeneration 
scheme. By stationarity and by definition (33) of the random variable # x , we have 
for any m > cr\w\ 

P(^+! - T ; x > to) = P(6» x [l, +00] < -to|6» x [0, +00] = 0) = P(6» x [0] < -m + 1), 
which is summable in to, concluding the proof of Theorem |5.2| 



9. The complete perfect simulation algorithm, simulation and discussion 

9.1. The algorithm 

Algorithm 1 is "simplistic" in the sense that in order to compute 9[m,n], it uses 
the function C which is not explicit. A more complete algorithm is given below. We 
recall that for any a? n G A n - m+1 and u G [0, 1[ 

F(u, O := Y, e K(a\cr(<C))} + *t{u G [#£e, 1[, c T {a n m ) = 0}, 

aeA 

where if m = n + 1, then c T (a^) = c T (0) = 0. This function contains all the 
information we need about the probabilistic context tree (r,p), and we suppose that 
it is already implemented in the software used for programing the algorithm. 

The algorithm uses two variables: i which is a time index and B which is a set of 
time indexes. The set B keeps track of the set of sites which have to be constructed. It 
is initialized with B = {to, . . . , n}, which is the set of time indexes to be constructed, 
and the algorithm terminates when B = {0}. In the first "while" loop (lines 2 to 8), 
we sample U^, and directly attempt to construct [A^U)]^ using this information. 
If the algorithm manages to do this, it returns 9[m, n] = m and the constructed 
sample [A(U)]^. Otherwise, it enters the second while loop (lines 10 to 27). In this 
loop, each time the algorithm cannot construct the next site of B, it generates a new 
uniform random variable backward in time. At each new generated random variable, 
the algorithm attempts to go as far as possible in the construction of the remaining 
sites of B using the uniform that have been previously generated. 
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Figure 7: An explicit perfect simulation, using a sample U^_ 12 - The transition probabilities of the 
context tree are given in the bottom of the Figure. We refer to Figure|4]for the reader to recall what 
the small intervals represent. Both symbols 1 and 2 are 0.2-regulars. The probability transitions are: 
p(2|2) = 0.7, p(2|121) = 0.3, p(2|122) = 0.5, p(2|11211) = 0.3 and p(2|1112112) = 0.5 
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Algorithm 2 "Explicit" perfect simulation algorithm of the sample [X(U)] 

1: Input: m, n, F; Output: 6[m,n], ([X(V)] d [ m ^, . . . ,[X(XJ)] n ) 

2: Sample U m , . . . , U n uniformly in [0, 1[ 

3: i <- m, B = {to, . . . , n}, 6[m, n] <- m, [X(U)]« <- * n ~™+ 1 

4: while F(U h [X(U)]^ 1 ) € ^ and B ^ do 

5: [X(V)]i <- [X(U)]^ 1 ) 

6: B^B\{i} 
7: i «- « + 1 

8: end while 

9: j ■< — TO 

10: while B 7^ do 

11: i «- i — 1 

12: B^BU{j} 

13: Sample Ui uniformly in [0, 1[ 

14: while U t € [#£e, 1[ do 

15: i «- i — 1 

16: B^BU{i} 

17: Sample [/j uniformly in [0, 1[ 

18: end while 

19: [X{V)]i<-F(Ui,<b) 

20: B^B\{i} 

21: i <- minB 

22: while F(B t , [X(U)]*" 1 ) € A and B ^ do 

23: [X^^F^JXCU)]*- 1 ) 
24: B <- B \ {0 

25: t «- min B 
26: end while 
27: end while 

28: 6>[m, n] i 

29: return 0[m,n], ([X(U)] e[m , n] , . . . , [X(U)] n ) 



9.2. Simulation 

We will say that the algorithm makes a step each time it "enters" a "while" loop. 
On Figure [7J it corresponds to the number of arrows, plus one. The total number of 
steps N[m,n] needed for the construction of a sample X^ is 

N[m, n] = (n — m + 1) + 2 x (m — 9[m, n]). 

Let us denote by C the maximum number of operations the algorithm need in order 
to make a step. Suppose we want to construct a sample JTq -1 , then the expected 
number of operations is bounded above by 

Cx (n + 2 X E\0[O,n- 1]|). 

Figure [7] illustrates an explicit perfect simulation of this chain using a finite sample 
of U, in the case where e = 0, 2 and both symbols are e-regular. 

The results of the present paper tell us that in the conditions of Theorem |5.1| this 
expectation is finite. However, no insight is given on how large it can be. This is 
due to the fact that we did not manage to obtain sufficiently good explicit bounds 



for the probability of return to of the chain T>^°\ This is also the case of Comets 



et al. (2002). Anyway, this bound should depends strongly on the parameter e, as in 
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Figure 8: Graph representing the influence of the value of e on the quantity E|6[0]|. The x-axis 
represents the successives values of e, from 0.2 to 1. The j/-axis represents E|6> e [0]|, the expected 
value of 8[0], when this later is computed using value e. 



Comets et al. (20021 (see (2.4), (2.5) therein, their ao corresponds basically to our 
#£e). In absence of such bounds, we implemented the above pseudo-code in the case 
of the context tree of Section[3j We assume that p(2\v) — e for any v e r. Notice that 
this case corresponds to the i.i.d. chain with probability e to get the symbol 2. This 
assumption simplifies considerably the implementation of the algorithm and gives us 
the largest possible regeneration times within the class of probabilistic context trees 
for which the symbol 2 is e-regular. We used increasing values of e, from 0, 2 to 1, and 
for each value, we made the mean over 10, 000 iterations of Algorithm 2. The resulting 
graph is given in Figure [H] We can derive the corresponding expected number of steps 
E7V[0] realized by the algorithm, and the expected number of operations too. 



10. Final comments and references 

The first study of chains of infinite order seems to come back to the seminal 
papers of Onicescu and Mihoc ( 1935[ ). They called these chains "chaines a liaison 
completes" (chains with complete connections). Then Doeblin and Fortet (1937) 



proved the results on speed of convergence towards the invariant measure under the 
continuity conditions. We mention without further details some works as, for example, 



Harris 



Oberg 



19551 ), |Lalley| ( |1986[ ), |Berbee| ( |1987| ), j Bressaud et al.| (119991), | Johansson and 
( 20031) among others. We refer to the book of Iosifescu and Grigorescu (1990f 



for a complete review of the area, and to the book of Fernandez et al. (2001 ) for an 
introduction to the constructive approach. 



Chains with variable length have been introduced by Rissanen ( 1983 ) as a universal 



model for data compression. It has been shown to have a great applicability in 
statistical inference and modeling. For a review and reference in this area, we refer 



to the paper by Galves and Locherbarch (2008) 



On perfect simulation using the CFTP method, we refer to the webpage of Prof. 



David Bruce Wilson: http : //dbwilson. com/exact/ The reader will find therein an 



25 



extensive list of publications in the area. A very interesting issue, related to our work 
is whether or not the e-regularity (or the weakly non-nullness) assumption is necessary 
for the existence of a (not necessary practical) CFTP algorithm. We mention that 
necessary conditions exist for Markov chains: Foss and Tweedie ( 1998 ) shown that 
such a coupling exists if and only if the Markov chain is geometrically ergodic. 
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